AITopics | mixture proportion estimation

Mixture Proportion Estimation and PU Learning:A Modern Approach

Neural Information Processing SystemsDec-24-2025, 01:44:05 GMT

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)---determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning---given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples.

mixture proportion estimation, name change, proportion estimation and pu learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mixture Proportion Estimation and PU Learning:A Modern Approach

Neural Information Processing SystemsMay-26-2025, 18:58:28 GMT

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)---determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning---given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning.

artificial intelligence, machine learning, mixture proportion estimation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mixture Proportion Estimation and PU Learning:A Modern Approach

Neural Information Processing SystemsOct-10-2024, 05:33:59 GMT

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE)---determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning---given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning.

mixture proportion estimation, modern approach, proportion estimation and pu learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mixture Proportion Estimation Beyond Irreducibility

Zhu, Yilun, Fjeldsted, Aaron, Holland, Darren, Landon, George, Lintereur, Azaree, Scott, Clayton

arXiv.org Artificial IntelligenceJun-1-2023

The task of mixture proportion estimation (MPE) is to estimate the weight of a component distribution in a mixture, given observations from both the component and mixture. Previous work on MPE adopts the irreducibility assumption, which ensures identifiablity of the mixture proportion. In this paper, we propose a more general sufficient condition that accommodates several settings of interest where irreducibility does not hold. We further present a resampling-based meta-algorithm that takes any existing MPE algorithm designed to work under irreducibility and adapts it to work under our more general condition. Our approach empirically exhibits improved estimation performance relative to baseline methods and to a recently proposed regrouping-based algorithm.

artificial intelligence, data mining, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2306.01253

Country:

North America > United States > Michigan (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.68)
Government > Military (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DEDPUL: Method for Mixture Proportion Estimation and Positive-Unlabeled Classification based on Density Estimation

Ivanov, Dmitry

arXiv.org Machine LearningFeb-19-2019

This paper studies Positive-Unlabeled Classification, the problem of semi-supervised binary classification in the case when Negative (N) class in the training set is contaminated with instances of Positive (P) class. We develop a novel method (DEDPUL) that simultaneously solves two problems concerning the contaminated Unlabeled (U) sample: estimates the proportions of the mixing components (P and N) in U, and classifies U. By conducting experiments on synthetic and real-world data we favorably compare DEDPUL with current state-of-the-art methods for both problems. We introduce an automatic procedure for DEDPUL hyperparameter optimization. Additionally, we improve two methods in the literature and achieve DEDPUL level of performance with one of them.

algorithm, dedpul, mixture proportion estimation, (11 more...)

arXiv.org Machine Learning

1902.06965

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Mixture Proportion Estimation for Positive--Unlabeled Learning via Classifier Dimension Reduction

Lin, Zhenfeng, Long, James P.

arXiv.org Machine LearningJan-31-2018

Positive--unlabeled (PU) learning considers two samples, a positive set $P$ with observations from only one class and an unlabeled set $U$ with observations from two classes. The goal is to classify observations in $U$. Class mixture proportion estimation (MPE) in $U$ is a key step in PU learning. In this paper, we show that PU learning is a generalization of local False Discovery Rate estimation. Further we show that PU learning MPE can be reduced to a one--dimensional problem via construction of a classifier trained on the $P$ and $U$ data sets. These observations enable application of methodology from the multiple testing literature to the PU learning problem. In particular we adapt ideas from Storey [2002] and Patra and Sen [2015] to address parameter identifiability and MPE. We prove consistency of two mixture proportion estimators using bounds from empirical process theory, develop tuning parameter free implementations, and demonstrate that they have competitive performance on simulated waveform data and a protein signaling problem.

artificial intelligence, classifier, machine learning, (14 more...)

arXiv.org Machine Learning

1801.09834

Country: North America > United States > Texas (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Classification with Asymmetric Label Noise: Consistency and Maximal Denoising

Blanchard, Gilles, Flaska, Marek, Handy, Gregory, Pozzi, Sara, Scott, Clayton

arXiv.org Machine LearningAug-5-2016

In many real-world classification problems, the labels of training examples are randomly corrupted. Most previous theoretical work on classification with label noise assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. In this work, we give conditions that are necessary and sufficient for the true class-conditional distributions to be identifiable. These conditions are weaker than those analyzed previously, and allow for the classes to be nonseparable and the noise levels to be asymmetric and unknown. The conditions essentially state that a majority of the observed labels are correct and that the true class-conditional distributions are "mutually irreducible," a concept we introduce that limits the similarity of the two distributions. For any label noise problem, there is a unique pair of true class-conditional distributions satisfying the proposed conditions, and we argue that this pair corresponds in a certain sense to maximal denoising of the observed distributions. Our results are facilitated by a connection to "mixture proportion estimation," which is the problem of estimating the maximal proportion of one distribution that is present in another. We establish a novel rate of convergence result for mixture proportion estimation, and apply this to obtain consistency of a discrimination rule based on surrogate loss minimization. Experimental results on benchmark data and a nuclear particle classification problem demonstrate the efficacy of our approach.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

1303.1208

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)

Add feedback

Mixture Proportion Estimation via Kernel Embedding of Distributions

Ramaswamy, Harish G., Scott, Clayton, Tewari, Ambuj

arXiv.org Machine LearningMay-31-2016

Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.

data mining, machine learning, mixture proportion estimation, (10 more...)

arXiv.org Machine Learning

1603.02501

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology: